Estimation and Model Selection for Model-Based Clustering with the Conditional Classification Likelihood

نویسنده

  • Jean-Patrick Baudry
چکیده

The Integrated Completed Likelihood (ICL) criterion has been proposed by Biernacki et al. (2000) in the model-based clustering framework to select a relevant number of classes and has been used by statisticians in various application areas. A theoretical study of this criterion is proposed. A contrast related to the clustering objective is introduced: the conditional classification likelihood. This yields an estimator and a model selection criteria class. The properties of these new procedures are studied and ICL is proved to be an approximation of one of these criteria. We oppose these results to the current leading point of view about ICL, that it would not be consistent. Moreover these results give insights into the class notion underlying ICL and feed a reflection on the class notion in clustering. General results on penalized minimum contrast criteria and on mixture models are derived, which are interesting in their own right.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Model-Based Clustering, Classification, and Discriminant Analysis

The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...

متن کامل

Conditional Maximum Likelihood Estimation of the First-Order Spatial Integer-Valued Autoregressive (SINAR(1,1)) Model

‎Recently a first-order Spatial Integer-valued Autoregressive‎ ‎SINAR(1,1) model was introduced to model spatial data that comes‎ ‎in counts citep{ghodsi2012}‎. ‎Some properties of this model‎ ‎have been established and the Yule-Walker estimator has been‎ ‎proposed for this model‎. ‎In this paper‎, ‎we introduce the...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Modified Maximum Likelihood Estimation in First-Order Autoregressive Moving Average Models with some Non-Normal Residuals

When modeling time series data using autoregressive-moving average processes, it is a common practice to presume that the residuals are normally distributed. However, sometimes we encounter non-normal residuals and asymmetry of data marginal distribution. Despite widespread use of pure autoregressive processes for modeling non-normal time series, the autoregressive-moving average models have le...

متن کامل

Penalized Bregman Divergence Estimation via Coordinate Descent

Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012